What I was wondering: how many towns do we have with two official names?

I’ve heard many stories of confusion of tourists about our city names: They want to take the train to “Liège”" from Ghent but there only is one to “Luik”. Or they are driving south towards France and their GPS might tell them to follow the direction “Mons”, but as long as they are on Flemish territory they will see “Bergen”.
Luik/Liège, Bergen/Mons… It is the same city. The first is the Flemish name, the second the French.
But what I didn’t know is how many towns and cities we have that carry two official names.

Data source

I found everything I needed on (this website) from the Belgian government.

Cleaning the data

Starting by loading the packages needed:

#packages for the data exploration
library(tidyverse)
library(readxl)
library(ggplot2)

#packages for the maps
library(sp)
## Warning: package 'sp' was built under R version 3.4.3
library(tmap)
## Warning: package 'tmap' was built under R version 3.4.3
library(viridisLite)
## Warning: package 'viridisLite' was built under R version 3.4.3
library(leaflet)
## Warning: package 'leaflet' was built under R version 3.4.3


#Importing the data
raw_data <- read_excel("TF_SOC_POP_STRUCT_2017_tcm325-283761.xlsx", sheet=1)

#Keeping only the variables needed
data <- raw_data %>% 
  select(contains("MUNTY"), TX_RGN_DESCR_NL, CD_SEX, TX_NATLTY_NL, TX_CIV_STS_NL, CD_AGE, MS_POPULATION)
colnames(data) <- c("REFNIS", "TownNL", "TownFR", "Region", "Sex", "Nationality", "MaritalStatus", "Age", "Population")

#Translating Region names to English
data$Region <- data$Region %>% 
  str_replace("Vlaams Gewest", "Flanders") %>% 
  str_replace("Waals Gewest", "Wallonia") %>% 
  str_replace("Brussels Hoofdstedelijk Gewest", "Brussels agglomeration")

After importing the data, it contained a lot of administrative data I didn’t need. Additionnally, The data does not contain a total population by town, because it’s divided in demographic subsets. A bit of dplyr filtering showed me that in my home town, there are less than 30 people with the same characteristics as me (female, unmarried, Belgian, age 34) but that’s not really the most interesting.

Using dplyr I created a population table, and immediately added a column to compare Town Names in Flemish and French.

#Creating a dataframe with total population for each town, and adding a column to see whether they have the same name
popdata <- data %>% 
  group_by(TownNL, TownFR, Region, REFNIS) %>% 
  summarise(population=sum(Population)) %>% 
  arrange(desc(population)) %>%
  mutate(SameName = TownNL==TownFR) %>% 
  ungroup()

#Noticing an issue: 
popdata%>%
  filter(Region=="Flanders") %>% 
  filter(!SameName) %>% 
  print(n=11)
## # A tibble: 45 x 6
##                         TownNL                        TownFR   Region
##                          <chr>                         <chr>    <chr>
##  1                   Antwerpen                        Anvers Flanders
##  2                        Gent                          Gand Flanders
##  3                      Brugge                        Bruges Flanders
##  4                      Leuven                       Louvain Flanders
##  5                    Mechelen                       Malines Flanders
##  6               Aalst (Aalst)                 Alost (Alost) Flanders
##  7 Sint-Niklaas (Sint-Niklaas) Saint-Nicolas (Saint-Nicolas) Flanders
##  8                    Kortrijk                      Courtrai Flanders
##  9                    Oostende                       Ostende Flanders
## 10                   Roeselare                       Roulers Flanders
## 11      Beveren (Sint-Niklaas)       Beveren (Saint-Nicolas) Flanders
## # ... with 34 more rows, and 3 more variables: REFNIS <chr>,
## #   population <dbl>, SameName <lgl>

But luckily I noticed an issue quickly. Some town names were annotated with their district. Beveren is called the same in Flemish or French, but its district gets translated. To get rid of the districts, I cleaned out any word pattern between brackets, and redid the comparison to find out where town names are different.

#Removing the sectors between brackets
popdata$TownNL <- str_replace(popdata$TownNL, pattern="\\s\\(.+\\)", replacement="")
popdata$TownFR <- str_replace(popdata$TownFR, pattern="\\s\\(.+\\)", replacement="")

#Reassessing whether the names are the same, and removing the previous sameName column to avoid confusion
popdata <- popdata %>% 
  mutate(DiffName = TownNL != TownFR) %>%
  select(TownNL, TownFR, DiffName, population, Region, REFNIS)



A glimpse of the data exploration

There are 95 towns/cities with two different official names, which is 16% of the total amount of towns. Contrary to what some people assume, it’s more or less similar in both regions: 13% of Flemish towns have an official French name, 16% of Walloon towns have an official Flemish name on top. Only in Brussels, an official bilingual region, as a much higher percentage of ’double name’s.

#How many have exactly the same name?
sum(popdata$DiffName)
## [1] 95
mean(popdata$DiffName)
## [1] 0.1612903
#by region
popdata %>% 
  group_by(Region) %>% 
  summarise(NTowns=n(), N_SameName=n()-sum(DiffName), N_DiffName=sum(DiffName), 
           Prop_SameName =1-round(mean(DiffName),2), Prop_DiffName=round(mean(DiffName),2))
## # A tibble: 3 x 6
##                   Region NTowns N_SameName N_DiffName Prop_SameName
##                    <chr>  <int>      <int>      <int>         <dbl>
## 1 Brussels agglomeration     19          6         13          0.32
## 2               Flanders    308        269         39          0.87
## 3               Wallonia    262        219         43          0.84
## # ... with 1 more variables: Prop_DiffName <dbl>

Mapping the towns with two official names

Using tmap I created two first maps: one that shows the general regions in Belgium, and a second comparative one highlighting just the towns that have two official town names.

#Importing SPdataframe for Belgium
data("BE_ADMIN_MUNTY", package="BelgiumMaps.StatBel")

#Merging my 2017 data with the SPdataframe
mapdata <- merge(BE_ADMIN_MUNTY, popdata, by.x = "CD_MUNTY_REFNIS", by.y = "REFNIS")


#Making a file containing only the towns with different names 
popdata_DiffName <- popdata %>% 
  filter(DiffName==TRUE)
  
mapdataDiffName <- merge(BE_ADMIN_MUNTY, popdata_DiffName, by.x = "CD_MUNTY_REFNIS", by.y = "REFNIS")


#Creating a colour palette
virpalette <- rev(viridis(3))


#Plot different regions
regionplot<- tm_shape(mapdata) +
  tm_fill(col="Region", palette=virpalette,
          title = "Regions in Belgium")+
  tm_polygons(id="TownNL")+
  tm_layout(legend.position = c("left", "bottom"))


#Plot to show those with differnet name by region
nameplot <- tm_shape(mapdataDiffName) +
  tm_fill(col="Region", palette=virpalette, id="TownNL", 
          colorNA = "gray90", textNA="Same name", 
          title = "Different regional town names",legend.position = c("left", "bottom" ),
          popup.vars = c("TownNL","TownFR", "population", "Reason"))+
  tm_polygons(id="TownNL", "TownFR")+
  tm_layout(legend.position = c("left", "bottom"))


tmap_arrange(regionplot, nameplot)
## Warning: One tm layer group has duplicated layer types, which are omitted.
## To draw multiple layers of the same type, use multiple layer groups (i.e.
## specify tm_shape prior to each of them).

## Warning: One tm layer group has duplicated layer types, which are omitted.
## To draw multiple layers of the same type, use multiple layer groups (i.e.
## specify tm_shape prior to each of them).

## Warning: One tm layer group has duplicated layer types, which are omitted.
## To draw multiple layers of the same type, use multiple layer groups (i.e.
## specify tm_shape prior to each of them).

## Warning: One tm layer group has duplicated layer types, which are omitted.
## To draw multiple layers of the same type, use multiple layer groups (i.e.
## specify tm_shape prior to each of them).

A few things to notice: there is a slightly higher concentration of towns with two offical town names around the language border, but it doesn’t realy explain the full picture.

Distilling the reason for two official town names

Reason 1: Brussels, an official Bilingual region

In the above table it was obvious that the Brussels region has a much higher share of towns with two offical names: 68% versus the country average of 16%. Given Brussels’ status as bilingual that should not come as a surprise. I was actually more surprised to realize that there are still 6 that only have their former Flemish name, and some of them like “Ganshoren” isn’t really that easy to pronounce.

#Checking the data on Brussels
popdata %>% 
  filter(Region=="Brussels agglomeration") %>% 
  summarise(NTowns=n(), N_SameName=n()-sum(DiffName), N_DiffName=sum(DiffName), 
            Prop_SameName =1-round(mean(DiffName),2), Prop_DiffName=round(mean(DiffName),2))
## # A tibble: 1 x 5
##   NTowns N_SameName N_DiffName Prop_SameName Prop_DiffName
##    <int>      <int>      <int>         <dbl>         <dbl>
## 1     19          6         13          0.32          0.68
#List of names for Brussels
popdata %>% 
  filter(Region=="Brussels agglomeration") %>% 
  group_by(DiffName) %>%
  arrange(desc(DiffName), desc(population))
## # A tibble: 19 x 6
## # Groups:   DiffName [2]
##                    TownNL                TownFR DiffName population
##                     <chr>                 <chr>    <lgl>      <dbl>
##  1                Brussel             Bruxelles     TRUE     176545
##  2             Schaarbeek            Schaerbeek     TRUE     133042
##  3    Sint-Jans-Molenbeek  Molenbeek-Saint-Jean     TRUE      96629
##  4                 Elsene               Ixelles     TRUE      86244
##  5                  Ukkel                 Uccle     TRUE      82307
##  6                  Vorst                Forest     TRUE      55746
##  7 Sint-Lambrechts-Woluwe  Woluwe-Saint-Lambert     TRUE      55216
##  8            Sint-Gillis          Saint-Gilles     TRUE      50471
##  9    Sint-Pieters-Woluwe   Woluwe-Saint-Pierre     TRUE      41217
## 10               Oudergem             Auderghem     TRUE      33313
## 11    Sint-Joost-ten-Node Saint-Josse-ten-Noode     TRUE      27115
## 12    Watermaal-Bosvoorde   Watermael-Boitsfort     TRUE      24871
## 13    Sint-Agatha-Berchem Berchem-Sainte-Agathe     TRUE      24701
## 14             Anderlecht            Anderlecht    FALSE     118241
## 15                  Jette                 Jette    FALSE      51933
## 16              Etterbeek             Etterbeek    FALSE      47414
## 17                  Evere                 Evere    FALSE      40394
## 18              Ganshoren             Ganshoren    FALSE      24596
## 19             Koekelberg            Koekelberg    FALSE      21609
## # ... with 2 more variables: Region <chr>, REFNIS <chr>
#Adding a column to note down the reason for different names
reason_BXL <- popdata %>% 
  filter(Region=="Brussels agglomeration") %>% 
  filter(DiffName) %>%
  mutate(Reason = "Brussels")

Reason 2: Larger cities

Cities are generally more important and I would have guessed that most of our cities have two official names. By just looking at the difference in average population between towns that have two names (TRUE) and those who don’t, there clearly is a skew towards higher population town. A quick plot in ggplot confirms this to be true: grey shows all the towns in Belgium according to their population size on a logarithmic scale. I coloured those who have two names in green.

popdata %>%
  group_by(DiffName) %>% 
  summarise(mean=mean(population), median=median(population))
## # A tibble: 2 x 3
##   DiffName     mean median
##      <lgl>    <dbl>  <dbl>
## 1    FALSE 14744.06  11383
## 2     TRUE 42510.78  24701
#Plotting average town size of small and larger towns
ggplot()+
  geom_histogram(data=popdata, aes(x=population), fill="grey", alpha=0.6)+
  geom_histogram(data=subset(popdata, DiffName==TRUE), aes(x=population), fill="cadetblue4", alpha=1)+
  scale_x_log10()+
  labs(x= "Population", y="Number of towns", title="Size of towns with two official names amongst all towns in Belgium")

I took a shortcut to define our cities: the 10% highest populated towns.

#10% largest towns and cities in Belgium
quantile(popdata$population, probs = seq(from = 0, to = 1, by = .1))
##       0%      10%      20%      30%      40%      50%      60%      70% 
##     89.0   4372.2   6341.8   8308.4  10268.4  12123.0  14649.6  18473.6 
##      80%      90%     100% 
##  23259.6  34189.8 520504.0
#Proportion of Cities with different names
popdata %>% 
  filter(population > 34000) %>%
  summarise(NTowns=n(), N_SameName=n()-sum(DiffName), N_DiffName=sum(DiffName), 
            Prop_SameName =1-round(mean(DiffName),2), Prop_DiffName=round(mean(DiffName),2))
## # A tibble: 1 x 5
##   NTowns N_SameName N_DiffName Prop_SameName Prop_DiffName
##    <int>      <int>      <int>         <dbl>         <dbl>
## 1     60         27         33          0.45          0.55
#Adding a reason column 
reason_city <- popdata %>% 
  filter(population > 34000) %>%
  filter(Region != "Brussels agglomeration") %>% 
  filter(DiffName) %>% 
  mutate(Reason = "City")



Reason 3: German speaking region (and towns with German language facilities)

After World War I, the peace treaty of Versailles listed the annexation of 9 German towns into Belgium as compensation. They make up our third language region as German is still their main language today. Given that German and Dutch are both German langauges and have a lot of similarities it would make sense that the Flemish would refer to the German town names, while the French have changed some of them.

#Listing the German communes and the two additional towns with german facilities
germanspeaking <- c("Eupen", "Kelmis", "Lontzen", "Raeren", "Amel", "Büllingen", "Burg-Reuland", "Bütgenbach", 
                    "Sankt Vith", "Malmedy", "Weismes")

#Proportion of Cities with different names
popdata %>% 
  filter(TownNL %in% germanspeaking) %>%
  summarise(NTowns=n(), N_SameName=n()-sum(DiffName), N_DiffName=sum(DiffName), 
            Prop_SameName =1-round(mean(DiffName),2), Prop_DiffName=round(mean(DiffName),2))
## # A tibble: 1 x 5
##   NTowns N_SameName N_DiffName Prop_SameName Prop_DiffName
##    <int>      <int>      <int>         <dbl>         <dbl>
## 1     11          5          6          0.45          0.55
#German towns with two official names
popdata %>% 
  filter(TownNL %in% germanspeaking) %>%
  filter(DiffName==TRUE) %>% 
  print(n=nrow(.))
## # A tibble: 6 x 6
##       TownNL      TownFR DiffName population   Region REFNIS
##        <chr>       <chr>    <lgl>      <dbl>    <chr>  <chr>
## 1     Kelmis La Calamine     TRUE      10964 Wallonia  63040
## 2 Sankt Vith  Saint-Vith     TRUE       9661 Wallonia  63067
## 3    Weismes      Waimes     TRUE       7493 Wallonia  63080
## 4 Bütgenbach  Butgenbach     TRUE       5583 Wallonia  63013
## 5       Amel     Amblève     TRUE       5523 Wallonia  63001
## 6  Büllingen    Bullange     TRUE       5489 Wallonia  63012
#Adding a reason column 
reason_german <- popdata %>% 
  filter(TownNL %in% germanspeaking) %>%
  filter(DiffName) %>% 
  mutate(Reason = "German region")



Reason 4: Towns in Flanders or Wallonia with official language facilities

Always a topic for debate in Belgium: the towns with official language facilities. These are towns that belong to one region but they have some degree of bilingual facilities (it’s complicated!).

#Listing all towns with language facilities
faciliteiten <- c("Bever", "Drogenbos", "Herstappe", "Kraainem", "Linkebeek", "Mesen", "Ronse", 
                  "Sint-Genesius-Rode", "Spiere-Helkijn", "Voeren", "Wemmel", "Wezembeek-Oppem", 
                  "Edingen", "Komen-Waasten", "Moeskroen", "Vloesberg")

#Proportion of Cities with different names
popdata %>% 
  filter(TownNL %in% faciliteiten) %>%
  summarise(NTowns=n(), N_SameName=n()-sum(DiffName), N_DiffName=sum(DiffName), 
            Prop_SameName =1-round(mean(DiffName),2), Prop_DiffName=round(mean(DiffName),2))
## # A tibble: 1 x 5
##   NTowns N_SameName N_DiffName Prop_SameName Prop_DiffName
##    <int>      <int>      <int>         <dbl>         <dbl>
## 1     16          6         10          0.38          0.62
#Which towns have different names?
popdata %>% 
  filter(TownNL %in% faciliteiten) %>%
  filter(DiffName==TRUE) %>% 
  print(n=nrow(.))
## # A tibble: 10 x 6
##                TownNL             TownFR DiffName population   Region
##                 <chr>              <chr>    <lgl>      <dbl>    <chr>
##  1          Moeskroen           Mouscron     TRUE      57773 Wallonia
##  2              Ronse             Renaix     TRUE      26092 Flanders
##  3 Sint-Genesius-Rode Rhode-Saint-Genèse     TRUE      18231 Flanders
##  4      Komen-Waasten   Comines-Warneton     TRUE      18102 Wallonia
##  5            Edingen            Enghien     TRUE      13563 Wallonia
##  6             Voeren            Fourons     TRUE       4129 Flanders
##  7          Vloesberg            Flobecq     TRUE       3426 Wallonia
##  8              Bever            Biévène     TRUE       2160 Flanders
##  9     Spiere-Helkijn  Espierres-Helchin     TRUE       2142 Flanders
## 10              Mesen           Messines     TRUE       1049 Flanders
## # ... with 1 more variables: REFNIS <chr>
#Adding a reason column
reason_facilities <- popdata %>% 
  filter(TownNL %in% faciliteiten) %>%
  filter(DiffName) %>% 
  anti_join(reason_city) %>% 
  mutate(Reason = "Language facilities")



To summarize, there are a few reasons why towns have different official names * They are part of a bilingual region (Brussels) * They are a larger city * They are part of the German region * They have langauge facilities * They are close to the language border

Making a final map

Along the way I added additional reason columns, which I now want to merge into the mapdata:

#Creating a reason column for all other towns with two names
reason_other <- popdata %>% 
  filter(DiffName) %>% 
  anti_join(reason_city) %>% 
  anti_join(reason_BXL) %>% 
  anti_join(reason_german) %>% 
  anti_join(reason_facilities) %>% 
  mutate(Reason = "Other")


#Merging reasons into one dataframe
reason <- bind_rows(reason_BXL, reason_city, reason_german, reason_facilities, reason_other)

#Searching for duplicates before join
reason %>% 
  group_by(REFNIS) %>% 
  filter(n() > 1)
## # A tibble: 0 x 7
## # Groups:   REFNIS [0]
## # ... with 7 variables: TownNL <chr>, TownFR <chr>, DiffName <lgl>,
## #   population <dbl>, Region <chr>, REFNIS <chr>, Reason <chr>
#Joining reasons into the main dataframe
popdata_reason <- left_join(popdata, reason)
## Warning: One tm layer group has duplicated layer types, which are omitted.
## To draw multiple layers of the same type, use multiple layer groups (i.e.
## specify tm_shape prior to each of them).